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ABSTRACT 

The Force and Motion Conceptual Evaluation ( FMCE ) is a 
multiple-choice test that has been used to evaluate physics instruction. 
However, the validity and reliability estimates have not been determined in a 
way a social scientist would expect. Few psychometric data were used to 
estimate the validity and reliability of the FMCE instrument. This study used 
several methods to estimate the reliability and structural validity of the 
FMCE instrument. Data from the first semester of a noncalculus physics course 
was used to calculate Cronbach alpha reliability estimates and, using factor 
analysis, evaluate the construct validity of the instrument. For the pilot 
study, the pretest was given to 38 students and the posttest to 20. Fifty- 
four students participated in the fall 2002 pretest. A table of 
specifications also was used to estimate the content validity of the FMCE. 

The pilot study suggested that the FMCE is a valid and reliable measure of 
the concepts of force and motion, and the ongoing study will provide further 
investigation. (Contains 4 tables and 22 references.) (Author/SLD) 
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Title: The Force and Motion Conceptual Evaluation 
Presenter: Susan Ramlo, The University of Akron 

Abstract 

The Force and Motion Conceptual Evaluation (FMCE) is a multiple-choice test 
that has been used to evaluate physics instruction. However, the validity and reliability 
estimates have not been determined in the way a social scientist would expect. Little 
psychometric data were used to estimate the validity and reliability of the FMCE 
instrument. This study uses several methods to estimate the reliability and structural 
validity of the FMCE instrument. Data from the first semester of a non-calculus physics 
course was used to calculate Cronbach alpha reliability estimates and, using factor 
analysis, evaluate the construct validity of the instrument. A table of specifications also 
was used to estimate the content validity of the FMCE. 

Physics Education 

The science literacy of most Americans has not kept pace with the role of 
science in their lives (Committee on Undergraduate Science Education, 1999). Scientific 
literacy involves the understanding of scientific concepts that occur in everyday 
experiences (National Research Council, 1996). Similarly, many students emerge from 
their study of physics with serious gaps in their understanding of important concepts 
(McDermott & Redish, 1999). Thus, much of the research in physics education has 
focused on conceptual understanding or problem solving performance (McDermott & 
Redish, 1999). Strong problem solving skill and conceptual knowledge are the two most 
important goals of physics instruction (Mestre, Dufresne, Gerace, and Hardiman, 1993). 
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Lawson (1995) stated that a concept has been formed whenever two or more types 
of knowledge have been grouped or classified together and set apart from other types of 
knowledge, based on some common feature, form, or property. Learning important 
science concepts and principles is difficult because there is resistance to conceptual 
change due to learners’ everyday experiences. Science fields where students’ pre- 
instructional conceptions are deeply rooted in daily life experiences are especially 
difficult to change (Duit & Treagust, 1998). Such is the case with physics, especially in 
the domain of mechanics (force and motion) (Gil-Perez & Carrascosa, 1990). 

Conceptual change has been defined as the occurrence of changes either within or 
between existing knowledge structures (Hewson, Beeth, & Thorley, 1998). Students 
must integrate all of their conceptual understandings (Duit & Treagust, 1998; Hewson et 
al., 1998; Howe, 1996; Piaget, 1995; Posner, Strike, Hewson, & Gertzog, 1982; 

Vygotsky, 1986). In general, learners resist changing their current conceptions of 
physical events (Arons, 1990; Gil-Perez & Carrascosa, 1990; Philips, 1991; Posner, et al., 
1982). A radical conceptual change involves a shift between two epistemologically 
distinct categories. An example of such a shift would be from thinking of force as an 
entity to force as an event or a process (Hewson et al., 1998). As Vygotsky (1986) and 
others (Duit & Treagust, 1998; Hewson et al., 1998; Posner et al., 1982) pointed out, 
conceptual change is an ongoing process. 

Student misconceptions are often based on ideas about particular situations 
(Dykstra et al., 1992). Dykstra et al. (1992) used questions from the Force and Motion 
Conceptual Evaluation (FMCE) to examine student misconceptions. For example, 
through student interviews, the researchers found that the students’ “motion implies 
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force” conception led students to suggest that objects in motion must be experiencing a 

force. For instance, students suggested that a force (in addition to gravity) is needed to 

propel a block down an inclined plane because it is moving down the plane (Dykstra et 

al., 1992). These findings are consistent with those reported by Thornton (1996). 

Thornton (1997, p. 251) summarized the views of force and motion as follows: 

Physicist View (model): The relationship between force and motion is very 
coherent. If there is an acceleration, there is a force in the direction of the 
acceleration and vice versa. The force is proportional to the acceleration and to 
the mass of the object (F=ma). The state of motion of the object (e.g. moving or 
still, slowing down or speeding up), the identity of the object, and the source of 
force on the object do not alter the force/motion relationship. 

Student View: The rules that relate the force to the acceleration and/or velocity 
for an object can be different for an object speeding up, standing still, moving at 
constant velocity, or slowing down. The identity of object, the source of force, 
and the specific situation can sometimes change the view of force required for a 
particular motion. More than one view may be held at the same time. 

Thornton (1997) added that, from a physicist’s point of view, the student views of 

force and motion lack generalizability. However, the student views are not arbitrary. 

Students in physics or physical science courses in American colleges and secondary 

schools hold a limited number of common views and combinations of views (Thornton, 

1997). However, in order to evaluate student conceptual understanding of force and 

motion, instruments that can reliably and validly measure these concepts are necessary. 

Evaluating the concepts of force and motion 
The study by Dykstra et al. (1992) used the FMCE in a qualitative study, as 
described above. A study by Svec (1999) is an example of how the FMCE has been used 
in quantitative studies. In his study, Svec compared microcomputer-based laboratory 
(MBL) instruction and traditional laboratory instruction relative to the learning of graph 
interpretation skills and motion concepts. This study used students from two different 
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undergraduate introductory physics courses offered in the same state but at two different, 
large, midwestem universities. Multiple-choice tests included graphing and non- 
graphing questions on motion concepts, adapted from the Force and Motion Conceptual 
Evaluation by Thornton (1996), the Mechanics Diagnostic Test by Halloun and Hestenes 
(1985), and the Force Concept Inventory by Hestenes, Wells, & Swackhamer (1992). 

The effect sizes were calculated for specific types of questions but not specific concepts. 
The question types consisted of graphing interpretation skills, conceptual understanding 
related to the 34 questions that used graphs, and the textual motion concept questions. 
Gain scores, post-test minus pretest, were used to determine the effect sizes (Svec, 1999). 
When researchers use gain scores, they are assuming that the factors of the pretest and 
post-test are the same (Brown, et al., 2002). However, no studies investigating the 
pretest and/or post-test factor structure of the FMCE were found in the literature. 

The Development of the Force and Motion Conceptual Evaluation 
The FMCE was developed by Ronald Thornton and the Center for Science and 
Mathematics Teaching at Tufts University. They constructed the FMCE from earlier 
testing of students using free response questions requiring written answers and the 
drawing of graphs (Thornton, 1993; Thornton & Sokoloff, 1998). The FMCE is a 
multiple-choice test that consists of 47 questions. A copy of the FMCE is in the 
Appendix. Students choose from a list of five to nine answers for each question. The 
questions target concepts of velocity, acceleration, and force (Thornton, 1993, 1996, 
1997; Thornton & Sokoloff, 1997, 1998). The questions use graphical representations 
and “natural language” (story problem) contexts. The natural language questions do not 
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involve any coordinate system references and do not explicitly describe the force acting 
(Thornton & Sokoloff, 1998). 

The FMCE was developed for a number of reasons. The multiple-choice 
questions take less time and less effort is involved in analyzing large samples. More 
importantly, the evaluation of the FMCE is less subjective than the earlier open response 
questionnaire (Thornton, 1993). Selections from the FMCE have been used in a number 
of similar studies to measure student Newtonian conceptual understanding of force and/or 
motion (Thornton, 1993, 1996, 1997; Thornton & Sokoloff, 1990, 1997, 1998). 

Validity and Reliability of the FMCE in the Literature 

The evidence given in the literature regarding the validity and reliability of the 
FMCE lacks the amount of psychometric data that a social scientist researcher would 
expect. Validity is the most important characteristic of any test. Validity of a test is 
defined as the degree to which a test measures what it is intended to measure. Reliability 
of a test is defined as the consistency of the measure (Newman & Newman, 1994). 

Discussions, in the literature, regarding estimates of the reliability of the FMCE 
have included statements such as: 

1 . Ninety-five percent of all responses were consistent with most common 
student model or with a Newtonian model (Thornton, 1996). 

2. Students appear to give “almost no random answers” (Thornton, 1993, p. 
9) and guessing requires students to select from up to nine answers 
(Thornton & Sokoloff, 1998). 
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3. Studies that included over 5000 college and university physics students 
showed that the pretest results vary little from year to year (Thornton, 
1993). 

Similarly, statements within the literature regarding the FMCE’s validity have included 
statements such as: 

1 . Thornton (1996) found that student answers to the multiple choice graphical 
format questions correlated with the answers given for questions probing the same 
concepts but asked in a very different format. 

2. Ninety-five percent of students interviewed gave verbal explanations of velocity 
and acceleration that were consistent with their earlier graph choices on the 
FMCE (Thornton, 1993). 

3. Students who answered force graph (problems 14 through 21) and sled questions 
(problems 1 through 7) on the FMCE from a Newtonian viewpoint were able to 
answer other previously unseen questions about force from a Newtonian view 
(Thornton, 1996). 

4. Free response of more than 200 students matched by more than 98% with their 
multiple-choice answers given on the FMCE (Thornton, 1996). 

Thus, the investigators who have attempted to examine the validity and reliability 
of the FMCE instrument were not trying to establish these estimates in the way a social 
scientist would expect. Little psychometric data were used to estimate the validity and 
reliability of the FMCE instrument. A literature search did not reveal research where the 
FMCE instrument was evaluated for construct validity. No reliability estimates were 
calculated in the literature using a measure of internal consistency such as Cronbach 



S. Ramlo 



MWERA 2002 



7 



alpha (Thornton 1993, 1996, 1997; Thornton and Sokoloff 1990, 1997, 1998). Therefore, 
a preliminary pilot study was conducted to investigate the reliability and validity of the 
FMCE during the Spring 2002 semester. A second study is being conducted during the 
Fall 2002 and Spring 2003 semesters that will include additional estimates of the 
reliability and validity of the FMCE. 

Evaluation of the FMCE 

This paper includes the pretest and post-test results from the Spring 2002 pilot 
study and the pretest results from the Fall 2002 semester investigation. The entire Force 
and Motion Conceptual Evaluation (FMCE) was used in both of these studies. Pretests 
and post-tests consisted of the same multiple-choice questions. A copy of the FMCE is in 
the Appendix. 

For the two studies, students took the pretest and completed a questionnaire 
during the first week of the semester during the lab period. For the pilot study, the post- 
test was given during the last laboratory meeting, week 14, of the spring semester. The 
pilot study had a pretest sample size of 38 and a post-test sample size of 20. Fifty-four 
students participated in the Fall 2002 pretest study. 

The participants in both studies were enrolled in the Technical Physics: 

Mechanics I and/or II courses. These two half semester courses are offered consecutively 
each semester within the Community and Technical College (C&T). The C&T is on the 
main campus of a large midwestem state university and offers both associate and 
bachelor degrees. The Technical Physics courses serve students in six different associate 
degree programs in engineering technology. The lecture portion of the course consists of 
2.5 hours of class time per week spread over two or three class meetings per week. The 
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laboratory meets once a week for 2.5 hrs. The lectures have 25 or fewer students 
enrolled. The associated laboratories have a limit of 16 students. Students must take the 
associated laboratory for each half- semester at the same time they take the course. 

During the laboratory, all students worked in collaborative, self-selected groups of 2 to 4 
students. All laboratories used the Realtime Physics MBL Laboratories discussed in 
Thornton and Sokoloff (1997). 

All participants were engineering technology majors. Students in the pilot study 
had an average age of 25 and 92% of the students were male. Forty-two percent of the 
pilot study students had taken a prior physics course in high school and/or college. 
Similarly, participants in the Fall 2002 study had an average age of 24 and 87% were 
male. Thirty-seven percent of the later study had high school physics and 4% had taken a 
prior college level physics course. 

Statistical Treatment 

Reliability estimates were calculated using Cronbach alpha, a measure of internal 
consistency. An R-factor analysis was conducted using the principal components method 
with Varimax rotation. R-factor analysis uses a data set where the columns are variables 
and the rows are participants. Varimax is an orthogonal factor rotation method and is the 
most frequently used rotation method (Stevens, 2002). Orthogonal solutions are more 
stable and easier to interpret than oblique solutions (Stevens, 2002). In addition, the 
factor analysis was run with ones in the matrix diagonal. This is frequently referred to as 
component analysis. An eigenvalue cut off of one and a scree-test were used to 
determine when to stop factoring. The scree-test is a graphical method where the 
magnitudes of the eigenvalues are plotted against their ordinal numbers (Stevens, 2002). 
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This confirmatory factor analysis enabled the researcher to evaluate the construct validity 
of the FMCE instrument. 

FMCE Evaluation Results 

The mean of the 38 scores on the pilot study pretest was 7.74 with a standard 
deviation of 2.88. Similarly, the mean pretest score for the 54 participants in the Fall 
2002 semester was 7.80 with a standard deviation of 3.40. For the pilot study, the mean 
post-test score was 15.57 and the standard deviation was 9.09. Twenty-one students took 
the post-test. Of these, 17 had also taken the pretest. 

The Cronbach alpha test estimated the reliability of the FMCE instrument at 0.50 
at pretest and 0.94 at post-test for the pilot study data. Similarly, Cronbach alpha analysis 
for the pretest data of the Fall 2002 study, gave an estimated reliability of 54%. In 
addition, the pretest and post-test data were factor analyzed to examine the construct 
validity of the FMCE. A scree-plot was used to determine the number of factors for 
pretest and post-test. Based on these results, eigenvalue-cutoff values of 3 and 2.5 were 
used for the both sets of pretest data and the post-test data, respectively. These 
eigenvalue cutoffs yielded three factors for the pretests and five factors for the post-test. 

Factor structure was determined by using only clean variables (variables with 
loadings greater than .3 on no more than one factor). For both of the pretests, the factor 
structure did not reveal a distinct pattern. This was expected since the pretest was given 
before any instruction. 

For the pilot study, the clean loadings of the post-test questions indicated a 
distinct factor structure. The factor structure at post-test contained five factors: (1) 
Concepts regarding Newton’s first and second laws; (2) Newton’s third law concepts; (3) 
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Concepts regarding gravitational force; (4) Velocity concepts; and (5) Acceleration 
concepts. The question distribution among the first four of the factors is given in the 
tables below. The concept that each question is measuring was determined by a table of 
specifications described in the next section of this paper. 

Table 1 

Questions loading on the “Concepts regarding Newton’s first and second laws” factor 



Question no. 


Factor loading 


Main question concept 


2 


.660 


Force 


3 


.685 


Force 


4 


.894 


Force 


5 


.777 


Force 


6 


.894 


Force 


7 


.586 


Force 


8 


.894 


Force 


9 


.894 


Force 


16 


.894 


Force 


18 


.689 


Force 


19 


.894 


Force 


20 


.466 


Force 


21 


.894 


Force 



Table 2 

Questions loading on the “Newton’s third law concepts” factor 
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Question no. 


Factor loading 


General question concept 


15 


-.467 


Force 


17 


.802 


Force 


30 


.861 


Force 


32 


.810 


Force 


34 


.883 


Force 


35 


.611 


Force 


36 


.807 


Force 


37 


.526 


Force 


38 


.894 


Force 


39 


.639 


Force 


Table 3 






Questions loading on the 


“Concepts regarding gravitational force” factor 


Question no. 


Factor loading 


General question concept 


10 


.670 


Force 


11 


.802 


Force 


12 


.634 


Force 


13 


.802 


Force 


14 


.802 


Force 
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Table 4 



Questions loading on the “Velocity concepts” factor 



Question no. 


Factor loading 


General question concept 


40 


.617 


Velocity 


41 


.812 


Velocity 


42 


.538 


Velocity 


43 


.827 


Velocity 



The fifth factor, “Acceleration concepts”, had only one question, 22, load on it 
cleanly. Question 22 is one of eight questions on the FMCE that deal with analyzing 
acceleration graphs. Of the remaining seven questions that dealt with acceleration 
graphs, all had loadings of .564 or higher on the acceleration factor. However, these 
questions also loaded on at least one other factor that was related to force. Since 
Newton’s Second Law of Motion demonstrates the relationship between force and 
acceleration, the dirty loadings of most of the acceleration questions makes physical 
sense. 

Table of Specifications 

Four experts evaluated the FMCE. Each expert had a minimum of a master’s 
degree in physics or a related field. These evaluators examined each question of the 
FMCE and indicated what main concept was being measured by the question and then 
rated how well that question measured that concept using a scale of 1 to 100%. For 40 
out of 47 questions, the experts agreed 100% on the concept being measured. The 
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concept agreement was 75% for the remaining seven questions. Ratings of how well 
each question measured the chosen concept ranged from 46 to 83%. These results 
indicate strong content validity of the FMCE instrument. 

Discussion of the Results 

In general, a test is considered reliable if its reliability estimate is 0.9 or higher 
(Newman & Newman, 1994). The Cronbach alpha test estimated the reliability of the 
FMCE instrument at 0.94 at post-test. The factor structure results from the post-test 
suggested strong construct validity for the FMCE. Construct validity is most important if 
a test score is to be interpreted as representing a measure of some particular construct or 
attribute (Newman & Newman, 1994). The content and expert validity from the table of 
specifications additionally indicates strong validity for the FMCE instrument. 

Implications and Further Research 

The pilot study results indicated that the FMCE is a valid and reliable measure of 
the concepts of force and motion. However, the small number of participants limited this 
pilot study. In addition, only 17 of the pilot study participants who took the pretest also 
took the post-test. Subsequent investigation of the FMCE validity and reliability needs to 
take place where the number of participants is larger and where only those participants 
taking both the pretest and post-test are used in the analysis of the construct validity. The 
study currently in progress will enable this type of analysis of the FMCE. In addition, the 
validity and reliability of the FMCE should be investigated in other classroom situations 
at both the high school and college level. Finally, an investigation that compares the 
FMCE factor structures of males and females is warranted. 
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